24 research outputs found

    A multi-layered Bayesian network model for structured document retrieval

    Get PDF
    New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

    Ranking structured documents using utility theory in the Bayesian network retrieval model

    Get PDF
    In this paper a new method based on Utility and Decision theory is presented to deal with structured documents. The aim of the application of these methodologies is to refine a first ranking of structural units, generated by means of an Information Retrieval Model based on Bayesian Networks. Units are newly arranged in the new ranking by combining their posterior probabilities, obtained in the first stage, with the expected utility of retrieving them. The experimental work has been developed using the Shakespeare structured collection and the results show an improvement of the effectiveness of this new approach

    A systematic approach to normalization in probabilistic models

    Get PDF
    Open access funding provided by Austrian Science Fund (FWF). This research was partly supported by the Austrian Science Fund (FWF) Project Number P25905-N23 (ADmIRE). This work has been supported by the Self-Optimizer project (FFG 852624) in the EUROSTARS programme, funded by EUREKA, the BMWFW and the European Union

    Towards a Better Understanding of the Relationship between Probabilistic Models in IR

    Get PDF
    Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work

    Opinion-aware retrieval models based on sentiment and intensity of lexical features

    No full text
    Sentiment analysis has received much attention in Information Retrieval (IR) and other domains including data mining, machine learning algorithms and NLP. However, when it comes to big data, incorporating sentiment of words into IR models becomes even more important, and as yet no widely accepted standard exists for this task. The contribution of this paper is a framework for quantifying term frequency (TF) variants with sentiments. We propose models derived from the strength of lexical features to improve sentiment-based ranking

    Explicitly considering relevance within the language modeling framework

    No full text
    Whilst the event of relevance is central to the Binary Independence Retrieval model, Language Modeling focuses on the estimation of the document model. In this paper, we review the different past formulations of the Language Modeling (query likelihood) approach. We find that these previous formulations largely ignore relevance by making implicit or explicit assumptions. The main contribution of this work is an alternative formulation that specifically relates relevance and language modeling in a sound probabilistic framework. This leads to valuable insights into the application of Language Modeling to Information Retrieval, including how the approach handles relevance information and how the approach can be further developed

    ADOR: A New Medical Dataset for Sentiment-based IR

    No full text
    Sentiment analysis has received attention in retrieval applications. Combining opinions such as user feelings with semantics would enhance the performance of these applications, especially when the level of urgency is essential, e.g., medical domain. However, no widely medical benchmark is known for evaluating sentiment-aware IR. In this paper, we create a dataset based on Amazon reviews for medical products and make it publicly available. To assess the compatibility of the benchmark with opinions and concepts we propose a sentiment-aware extension of TF.IDF and apply it to the dataset. This model is derived from linear combinations of sentiment-based TF.IDF score with term-based and conceptual TF.IDF scores. The benchmark could help healthcare organizations to effectively detect, rank and filter the most urgent notifications based on patient's health status, narratives and conditions

    A Descriptive Approach to Classification

    No full text
    Abstract. Nowadays information systems are required to be more adaptable and flexible than before to deal with the rapidly increasing quantity of available data and changing information needs. Text Classification (TC) is a useful task that can help to solve different problems in different fields. This paper investigates the application of descriptive approaches for modelling classification. The main objectives are increasing abstraction and flexibility so that expert users are able to customise specific strategies for their needs. The contribution of this paper is two-fold. Firstly, it illustrates that the modelling of classifiers in a descriptive approach is possible and it leads to a close definition w.r.t. mathematical formulations. Moreover, the automatic translation from PDatalog to mathematical formulation is discussed. Secondly, quality and efficiency results prove the approach feasibility for real-scale collections.
    corecore